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S Motivated by a synchronization problem in distributed computing we studied a simple growth 
model on regular and small-world networks, embedded in one and two- dimensions. We find that 
the synchronization landscape (corresponding to the progress of the individual processors) exhibits 
Kardar-Parisi-Zhang-like kinetic roughening on regular networks with short-range communication 
links. Although the processors, on average, progress at a nonzero rate, their spread (the width of 
the synchronization landscape) diverges with the number of nodes (desynchronized state) hindering 
efficient data management. When random communication links are added on top of the one and 
two-dimensional regular networks (resulting in a small- world network), large fluctuations in the 
synchronization landscape are suppressed and the width approaches a finite value in the large 
system-size limit (synchronized state). In the resulting synchronization scheme, the processors make 
close-to-uniform progress with a nonzero rate without global intervention. We obtain our results 
by "simulating the simulations", based on the exact algorithmic rules, supported by coarse-grained 
arguments. 



> ■ I. INTRODUCTION 

oo 

The study of complex networks pervades various areas of science ranging from sociology to statistical physics PH2>H|. 
I , A network in terms of modeling can be defined as a set of items, referred to as nodes with links connecting them. 
Examples of real life complex networks include the Internet 0, , the World Wide Web , metabolic networks , 
\Q \ transportation networks H El , and social networks 01 ■ 

Regular lattices are commonly used to study physical systems with short-range or long-range interactions. Earlier 
network studies focused mostly on the topological properties of the networks. Recent works, motivated by a large 
number of natural and artificial systems, such as the ones listed above, have turned the focus onto processes on 
networks, where the interaction and dynamics between the nodes are facilitated by a complex network. The question 
then is how the collective behavior of the system is influenced by this possibly complex interaction topology. Watts 
and Strogatz, inspired by a sociological experiment [ill ], have proposed a network model known as the small- world 
(SW) network |l2|. The SW concept describes the observation that, despite their often large size, there is a relatively 
short path between any two nodes in most networks with some degree of randomness. The SW model was originally 
constructed as a network to interpolate between regular lattices and completely random networks [l3|. Systems 
• i-h , and models (with well known behaviors on regular lattices) have been studied on SW networks, such as the Ising 
X ' model B El 5 1 E^ jth e XY model [3, phase ordering [ig|, the Edwards- Wilkinson model [Sa, H3, 123, diffusion 
[2fll2lll2a . l23 . l24l25ll26ll2^| . and resistor networks (28|. Closely related to phase transitions and collective phenomena 
is synchronization in coupled multi-component systems . SW networks have been shown to facilitate autonomous 
synchronization which is an important feature of these networks from both fundamental and system-design points of 
view |30l l3ll l32] | . In this paper we study a synchronization problem which emerges |33l ] in certain parallel distributed 
algorithms referred to as parallel discrete-event simulation (PDES) [2. IH El E3 • First, we find that constructing a 
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S W-like synchronization network for PDES can have a strong impact on the scalability of the algorithm 38] . Secondly, 
since the particular problem is effectively "local" relaxation in a noisy environment in a SW network, our study also 
contributes to the understanding of collective phenomena on these networks. 

Simulation of large spatially extended complex systems in physics, engineering, computer science, or military appli- 
cations require vast amount of CPU-time on serial machines using sequential algorithms. PDES enabled researchers 
to implement faithful simulations on parallel/distributed computer systems, namely, systems composed of multiple 
interconnected computers |34Ll35ll36ll37| . PDES is a subclass of parallel and distributed simulations in which changes 
in the components of the system occur instantaneously from one state to another. These changes are referred to as 
discrete events, e.g., spin- flip attempts in magnetic Ising models with Glauber dynamics. The primary motivation 
in PDES is to perform parallel simulation for large systems without altering the original physical dynamics. In the 
physics, chemistry and biology communities these types of simulations are most commonly referred to as dynamic 
or kinetic Monte Carlo simulations. Examples for real-life complex systems where discrete-event models are appli- 
cable include cellular communication networks 13(1 13^1 , magnetic systems 0, EH , spatial epidemic models [ToL l42f , 
thin-film growth (^3, EJ E3 , battle-field models |4fij. and internet traffic models 0] • In the above examples the 
discrete events are call arrivals, spin-flip attempts, infections, monomer depositions, troop movements, and packet 
transmissions/receptions respectively. 

PDES has two basic ingredients: the set of local simulated times of the processors or processing elements (PE) (also 
referred to as virtual times |4^) and a synchronization scheme |34[ . The difficulty in PDES is that the discrete events 
are not synchronized by a global clock since the dynamic is usually asynchronous. The challenge is algorithmically 
parallelizing the physically non-parallel dynamics, while enforcing causality between events and reproducibility. There 
are two main approaches in PDES: (i) conservative synchronization, which avoids the possibility of any typ e of 
causality errors by checking the causality relation between potentially related events at every update attempt p3 
and (ii) optimistic synchronization, which allows possible causality errors, then later initiates rollbacks to correct the 
erroneous computations |48l l5l| . Innovative methods have also been introduced to make optimistic synchronization 
more efficient, such as reverse computation j^. Other recent improvements to exploit parallelism in discrete-event 
systems are the "lookback" method jS^I and the freeze-and-shift algorithm |54j . 

As the number of available PEs on parallel architectures increases to tens of thousands |55j, and high-performance 
grid-computing networks emerge |5a . l57j , fundamental questions of the scalability of the underlying algorithms must 
be addressed. A PDES should have the following properties to be scalable 39]: (i) the virtual time horizon formed 
by the virtual times of the PEs should progress on average with a non-zero rate and (ii) the typical spread (width) 
of the time horizon should be bounded as the number of PEs, Npe, goes to infinity. The latter condition becomes 
important to avoid long delays while waiting for "slow" nodes |43( or, alternatively, to eliminate the need to reserve a 
large amount of memory for temporary data storage. In this paper we study regular and SW network communication 
topologies and show a possible way to construct fully scalable parallel algorithms for systems with asynchronous 
dynamics and short-range interactions on regular lattices. 

In order to understand scalability and synchronizability of PDES, we consider the parallel simulation itself as a 
complex interacting system where the specific synchronization rules correspond to the "microscopic dynamics" . A 
similar approach was also successful to establish a connection 58] between rollback-based (or optimistic) schemes 
|48[ and self-organized criticality |59l l60j | . Our approach exploits a mapping between non-equilibrium surface growth 
and the evolution of the simulated time horizon [33l l6l| so that we can use the tools and framework of statistical 
mechanics. A similar analogy between phase transitions and computational complexity has turned out to be highly 
fruitful to gain more insight into traditional hard computational problems [62l Jo. In this paper we consider the 
scalability of conservative synchronization schemes for self- initiating processes |64l l65| , where update attempts on 
each node are modeled as independent Poisson streams. We study the morphological properties of the simulated time 
horizon (or synchronization landscape). Through our study one also gains some insight into the effects of S W-like 
interaction topologies on the critical fluctuations in interacting systems. 

This paper, in part, is an expanded version of our earlier works on one-dimensional (ID) |33J and ID SW (SW 
embedded in one dimension) synchronization networks |38l ]. Further, we present new results for the two-dimensional 
(2D), and for the 2D SW (SW embedded in two dimensions) synchronization networks. The paper is organized as 
follows. In Section II wc show detailed results for a previously studied short-range model with nearest neighbor 
communication which we refer to as the basic conservative synchronization (BCS) scheme in one dimension |33j. In 
Section III we present results on SW networks constructed by adding random links on a regular one-dimensional 
network [3^]. Section IV and Section V present the results on the BCS scheme in two dimensions and its SW version, 
respectively. In Section VI we summarize our work and discuss results. 
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II. BASIC CONSERVATIVE SYNCHRONIZATION SCHEME IN ID 

First, we briefly summarize the basic observablcs relevant to our analysis for synchronization and their scaling rela- 
tions borrowed from non-equilibrium surface growth theory. The set of local simulated times for the PEs, {Tj(£)} i= ^ E , 
constitutes the simulated time horizon. Here Npe is the number of PEs and t is the discrete number of parallel 
steps, proportianal to the real/ wall-clock time. On a regular d-dimensional hypercubic lattice NpE=N d , where N is 
the linear size of the lattice. For a one-dimensional system Npg=N. For the rest of the paper we will use the term 
"height" , "simulated time" , or "virtual time" interchangeably, since we refer to the same local observable (local field 
variable). 

Since the discrete events in PDES are not synchronized by a global clock, the processing elements have to synchronize 
themselves by communicating with others. One of the first approaches to this problem for self-initiating processes 

0. 16^ is the BCS scheme proposed by Lubachevsky |6^.l6^| bv using only nearest neighbor interactions, mimicking 
the interaction topology of the underlying physical system [33. His basic model associates each component or site 
with one PE (worst-case scenario) under periodic boundary conditions. In this BCS scheme, at each time step, only 
those PEs whose local simulated time is not larger than the local simulated times of their next nearest neighbors are 
incremented by an exponentially distributed random amount so that the discrete events exhibit Poisson asynchrony. 
Otherwise (if the local simulated time of the PE is larger than any of its neighbors' simulated time), no update occurs, 

1. e., the PE idles. The evolution equation for the local simulated time of site i simply becomes 

T i (t + l)=T i (t)+1 li (t)Q(T i - 1 (t)-Ti(t))G(T i+1 (t)-T i (t)) , (1) 

where r)i(t) is an exponentially distributed random number and 0(---) is the Heaviside step- function. In one-dimension 
with periodic boundary conditions, the network has a ring topology as shown in Fig. ^a), so each node is connected 
to the nearest left and right neighbors. The nearest-neighbor interaction in the BCS scheme implies that in order to 
ensure causality, PEs need to exchange their local simulated (virtual) times only with neighboring PEs in the virtual 
network topology. The possible configurations for the local simulated times for the successive nodes are shown in 
Fig. [21 In these configurations an update occurs only if the node we are considering (node i) is a local minima. In 
the other three cases the node i idles. The algorithm is obviously free from deadlock, since at worst, the PE with 
the absolute minimum local simulated time can make progress. Note that from an actual parallel implementation 
viewpoint, equal virtual times (and hence conflicting updates) are extremely unlikely (to the extent of the resolution 
of generating continuously distributed exponential variables), and the algorithm allows updates if the neighboring 
virtual times are greater than or equal to that of PE i. From a theoretical viewpoint, for t>0, the probability density 
of the simulated time horizon {ri(i)}^ E is a continuous measure, so the probability that the simulated times of any 
two nodes are the same is of measure zero. 

In analyzing the performance of the above scheme, it is helpful that the progress of the simulation itself is decoupled 
from the possibly complex behavior of the underlying system. This is contrary to optimistic approaches, where the 
evolution of the underlying system and the progress of the PDES simulation are strongly entangled y58|, making 
scalability analysis a much more difficult task. 




FIG. 1: (a) One-dimensional (ID) regular network (with periodic boundary conditions), where nodes are connected to their 
nearest neighbors, (b) Small-world (SW) synchronization network, where each node is connected to exactly one randomly 
chosen node, in addition to the nearest neighbors. 

One of the important aspects of conservative PDES is the theoretical efficiency or utilization which is defined as the 
average fraction of non-idling PEs at each parallel step. This measure also provides the average progress rate of the 
simulation, as can be seen from Eq. In the BCS scheme, where only nearest-neighbor interactions are present, the 
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utilization is equal to the density of local minima in the simulated time horizon. Thus, the utilization (on a regular 
one-dimensional lattice) can be written as 

(«) = (e(Ti_i - 7?)e(7i+i - n)) = (e(-&_i)e(&)> , (2) 

where fa = rj+i — Tj is the local slope, 0(...) is the Heaviside step function, and (...) denotes an ensemble average 
over the noise in Eq. (JTJ. The utilization is independent of % for a system of identical PEs due to the translational 
invariance. 
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FIG. 2: Possible simulated-time configurations for three successive nodes (involving two successive slopes) in the basic conser- 
vative scheme (BCS) in one dimension. From the perspective of node i, only configuration (b) allows it to proceed (node i is a 
local minimum). In all other cases causality could be violated if an update occurs at site i, because the local field variables of 
the neighboring nodes are not known at the instant of update attempt. 



Another important observable of PDES is the statistical spread or width of the simulated time surface. The width 
of the simulated time surface w is defined as the root mean square of the virtual times with respect to the mean, 
w = y/ (w 2 ), where 

<tfl a > = <^JV,i)> = / ] ^2[r i (i)-f(t)] a ^ , (3) 

and f(t)=(l/N ) Yli=i T i{t) ^ s the mean progress ("mean height") of the time surface. 

Since we use the formalism and terminology of non-equilibrium surface growth phenomena, we briefly review scaling 
concepts for self-affine or rough surfaces. The scaling behavior of the width, (w 2 (N, t)), alone typically captures and 
identifies the universality class of the non-equilibrium growth process |68l l69t l7Cj . In a finite system the width 
initially grows as (w 2 (N, i))~i 2/3 , and after a system-size dependent cross-over time t x ^N z , it reaches a steady-state 
(w 2 (N, i))~iV 2a for t^>t x . In the relations above a, (3, and z=% are called the roughness, the growth, and the dynamic 
exponent, respectively. The temporal and system-size scaling of the width can be captured by the Family- Vicsek fn\ 
relation, 

(w 2 (N,t)) =N 2a f(t/N z ) . (4) 

Note that the scaling function f(x) depends on t and the linear system-size N only through the specific combination 
x = t/N z , reflecting the importance of the crossover time t x . For small values of its argument f(x) behaves as a 
power law, while for large arguments it approaches a constant, 

f ( \ \ ^ x^^X 

I[X) ~ \ const, if a>l ' 

yielding the correct scaling behavior of the width for early times and for the steady-state (x^>l). 

A somewhat less frequently studied quantity is the growth rate of a growing surface. This quantity is typically 
non-universal |33, |22> LLJ IzH uH lZ3 , but as was shown by Krug and Meakin [73, on d-dimensional regular lattices, 
the finite-size corrections to it are. In the context of the basic PDES scheme, the growth rate of the simulated time 
surface corresponds to the progress rate (or utilization) of the simulation, hence our special interest in this observable. 
For the finite-size behavior of the steady-state growth rate, one has [73| 
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FIG. 3: Virtual time horizon snapshots for 10,000 sites in ID. (a) For the regular network with nearest-neighbor connections 
(p=0). The lateral correlation length £ and width w = \fuP are shown for illustration in the graph. The rough steady-state 
surface belongs to the KPZ/EW universality class. For the SW synchronization network with (b) p=0.1 and (c) p—1 the heights 
are effectively decorrelated and both the correlation length and the width are reduced, and approach system-size independent 
values for sufficiently large systems. The resulting surface is macroscopically smooth. Note that the heights are relative to the 
average height. 



where (u(oo)) is the value of the growth rate in the asymptotic infinite system-size limit and a is the dimension- 
dependent roughness exponent of the growth process. 

Based on a mapping between virtual times and surface heights |3.lj | . in the coarse-grained description, the virtual 
time horizon of the BCS was found to be governed by the Kardar-Parisi-Zhang (KPZ) equation j^l, well-known 

in surface growth phenomena 

dtfi = V 2 f, - A(Vf,) 2 + ... + n(t) , (7) 

where V 2 stands for the discretized Laplacian, V 2 fi = fj+i + r,_i — 2fi, V is the discrctizcd gradient operator, 
Vfj = fj+i — Tj, fj = Tj — f is the surface height fluctuation (or virtual time) measured from the mean, rji(t) is a 
noise delta-correlated in space and time, (Vi{t)Vj(t')) = 2D5ij8{t — t'), A is a positive constant and ... stands for 
higher order irrelevant terms (in a renormalization group sense). Equation J7J can also give an account of a number 
of other nonlinear phenomena such as Burgers turbulence and directed polymers in random media |68j . When the 
simple update rule of the basic synchronization scheme is implemented on a one-dimensional regular network, one can 
observe a simulated time surface governed by the KPZ equation, and in the steady-state, by the Edwards- Wilkinson 
Hamiltonian |z3 [Fig.Ofa)]. 

When analyzing the statistical and morphological properties of the stochastic landscape of the simulated times, it 
is convenient to study the height-height correlation or its Fourier transform, the height-height structure factor. The 
equal-time height-height structure factor S(k,t) in one-dimension is defined as 

S(k,t)NS k ,- k > = (f k {t)fk'(t)) , (8) 

where f k — e^ lk ^Tj is the Fourier transform of the virtual times with the wave number k—2nn/N, 

n=0, 1, 2, N — 1 and 5k. -k' is the Kronecker delta. The structure factor essentially contains all the "physics" 
needed to describe the scaling behavior of the time surface. Here we focus on the steady-state properties (t— s-oo) of 
the time horizon where the structure factor becomes independent of ti me, lim^oo S(k,t) = S(k). In the long-time 
limit in one dimension, for a KPZ surface described by Eq. Q one has |72| 

S(k) = -f — — 7TTT ~ -ri , (9) 

v ; 2[1 - cos(fc)] k 2 ' w 

where the latter approximation holds for small values of k. By performing the inverse Fourier transformation of 
Eq. we can also obtain the spatial two-point function, 

G{l) = {l/N)Y,e lkl S{k), (10) 

fe^O 
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for 1 <C I <C N. In particular, for the steady-state width one finds 

< w2 > = jf E 5 ( fc ) = G (°) - % N ~ N ( 12 ) 

in one dimension [72| . This divergent width is caused by a divergent length scale, £, the "lateral" correlation length 
in the KPZ-like synchronization landscape. 

The measured steady-state structure factor [Fig.^a)], obtained by simulating the BCS based on the exact rules for 
the evolution of the synchronization landscape, confirms the coarse-grained prediction for small k values, S(k) ~ 1/fc 2 . 
Figure^Jb) shows the corresponding spatial two-point correlation function G(l). Simulations of the BCS scheme in one 




FIG. 4: (a) The steady-state structure factor as a function of the wave number for the BCS scheme in ID. The small-fc course- 
grained prediction (consistent with the steady-state EW/KPZ universality class in ID) is indicated by a dashed line [Eq. JjJJ]. 
Note the log-log scales, (b) Steady-state spatial two-point correlation function. The straight line again indicates the asymptotic 
EW/KPZ behavior in one dimension [Eq. HH l. 

dimension yield scaling exponents that agree within error with the predictions of the KPZ equation [(58|, |(5j| [78j] . The 
time evolution of the width [Fig.^a)] shows that the growth exponent /3~l/3. Looking at the system-size dependence 
of the steady-state width [Figure Efb)], we find the roughness exponent a~l/2, consistent with the one-dimensional 
KPZ value, (w 2 )^N 2a ^N . The dynamic exponent values found from the width as a function of the cross-over time 
and z=a/{3 are the same, about 3/2. The inset in Fig. Ufa) shows that the scaled version of the width evolution 
by using the scaling exponents is consistent with the Family- Vicsek relation [Eq. J3J], although with relatively large 
corrections to scaling. 

The width distributions, P(w 2 ), have been introduced to provide a more detailed characterization of surface growth 
processes 0, |U EJ EH EE EH an d have been used to identify universality classes [i3 • The width distribution of 
rough surfaces belonging to the same universality class is governed by a universal scaling function $(2;), such that 
P(w 2 )—(w 2 )~ 1 ^(w 2 1 (w 2 )). can be calculated analytically for a number of models, including the EW class 

[8l| . The width distribution for the basic synchronization scheme is shown in Fig. EJc). Systems with iV>10 3 show 
convincing data collapse onto this exact scaling function. The inset in Fig. EJc) shows the same graph in log-normal 
scale to show the collapse at the tail of the distribution. The convergence to the limit distribution is very slow when 
compared to other microscopic models (such as the single-step model [&8L l&jj p belonging to the same universality 
class. 

Now we discuss our findings for the steady-state utilization of the BCS scheme. As stated above, the synchronization 
landscape of the virtual times belongs to the EW universality class in one dimension. This implies that the local slopes 
in the steady-state landscape are short-range correlated |72j . Hence the density of local minima in the synchronization 
landscape, and in turn the utilization, remains nonzero in the infinite system-size limit |33l l72j. For fixed N, the 
utilization drops from relatively higher initial value at early times to its steady-state value in a very short time 
[Fig.|S]. Further, the steady-state utilizations for various systems converge to the asymptotic system-size independent 
value. In ID, where a=l/2, the system-size dependence of the steady-state utilization must follow [Eq. ©] (u(N)) ~ 
(u(oo)) + co ^ st - , as confirmed by our direct simulations for the BCS scheme [inset of Fig. 0. For the KPZ model 
[Eq. (J7J)] (u(oo)) = l/4, since in the steady state, the local slopes are delta-correlated, resulting in a probability of 
1/4 for the configuration in Fig. Hfb), corresponding to a local minimum. For the actual BCS synchronization 
profile (w(oo)}~0.2464 [3^, [T^, as a result of non-universal short-range correlations present for the slopes in the 




FIG. 5: (a) Time evolution of the width for different system-sizes in the BCS scheme in ID. The inset shows the same data 
on scaled axes, {w 2 )/N 2a versus t/N z . Each curve has been obtained by averaging over at least fifty realizations, (b) The 
steady-state width of the time horizon for the one-dimensional BCS as a function of system-size. The dashed straight line 
represents the asymptotic one-dimensional KPZ/EW behavior, (w 2 (N)^~N 2a with a=l/2. (c) Scaled width distributions for 
the BCS scheme in ID. The exact asymptotic EW/KPZ width distribution |8l| is shown with a dashed line. The inset shows 
the same distributions on log-normal scales. 




FIG. 6: Utilization in the ID BCS scheme as a function of time for various system sizes. The inset shows the steady-state 
utilization as a function of the l/iV 2 ' 1-c ^ with the ID KPZ roughness exponent a=l/2. The dashed line is a linear fit to Eq. © 
with a=l/2, (u(N)} « 0.2464 + 0.2219/iV. 



specific microscopic model |8C|. Finally one can argue that the utilization for the BCS on regular lattices remains 
nonzero in the thermodynamic limit and it displays universal finite-size effects [73, UM, IZM IZE UM ■ Note that a small 
change with respect to the "worst case" (one-site-per PE) scenario in particular, hosting more than one site per PE, 
leads to utilizations that are close to the limiting value of unity [75l l77| , and hence are practical for actual PDES 
implementations |4Ct 143. |44| . 

In order to obtain an analytically tractable scalability model for the BCS scheme, Greenberg et al. introduced the 
if-random interaction network model |87T |. In this model at each update attempt PEs compare their local simulated 
times with those of a set of K randomly chosen PEs. This set is re-chosen for each update attempt (i.e., the network 
is "annealed"), even if a previous update attempt has failed. It was shown that in the limit of i— >oo and N— >oo, the 
utilization (or the average rate of progress) converges to a non-zero constant, 1/(K + 1). They also suggested that 
the scaling properties of if-random model as t— >oo and ./V— >oo are universal and hold for regular lattices as well. 
But changing the interaction from nearest neighbor PEs on a regular lattice to randomly chosen PEs changes the 
universality class of the time horizon. Simply put, the underlying topology has crucial effects on the universal behavior 
of the time horizon. The random (annealed) interaction topology of the X-random model results in a mean-field-like 
behavior, where the simulated time surface is uncorrelated and has a finite width in the limit of an infinite number of 
PEs. Their conjecture for the width does not hold, thus, the BCS scheme for regular lattices cannot be equivalently 
described by the if -random model (at least not below the upper critical dimension of the KPZ universality class 
|88|). However, we were inspired by |87j to change the communication topology of the PEs by introducing random 
links in addition to the necessary short-range connections. In the next section we present our modification to the 
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original conservative scheme on regular lattices to achieve a fully scalable algorithm where both scalability conditions 
are satisfied: the simulation has a nonzero progress rate and the width of the synchronization landscape is finite. 



III. SMALL- WORLD SYNCHRONIZATION SCHEME IN ID 



The divergent width for the larger systems, discussed in the previous section, is the result of the divergent lateral 
correlation length £ of the virtual time surface, reaching the system-size N in the steady-state |68l 174. 17.4 l7fil l77j . 
To de-correlate the simulated time horizon, first, we modify the virtual communication topology of the PEs. The 
resulting communication network must include the original short-range (nearest-neighbor) connections to faithfully 
simulate the dynamics of the underlying system. In the modified network, the connectivity of the nodes (the number 
of neighbors) should remain non-extensive (i.e., only a finite number of virtual neighbors per node is allowed). This is 
in accordance with our desire to design a PDES scheme where no global intervention or synchronization is employed 
(PEs can only have O(l) communication exchanges per step). It is clear that the added synchronization links (or 
at least some of them) have to be long range. Short range links alone would not change the universality class and 
the scaling properties of the width of the time horizon. One can satisfy this condition by selecting the additional 
links called "small-world" links randomly among all the nodes in the network. Also, fluctuations in the individual 
connectivity should be avoided for load balancing purposes, i.e., requiring the same number of added links (e.g., one) 
for each node is a reasonable constraint. We then choose the extra synchronization links in such a way that each PE 
is connected to exactly one other PE via a "quenched" bidirectional link [Fig. db)] . 

One of the basic structural characteristics of SW-like networks is the "low degree of separation" between the nodes. 
The most commonly used observables to analyze this property are the average shortest path length, 5 avg (N), and the 
maximum shortest path length, S rnax (N). The shortest path length between two nodes is defined as the minimum 
number of nodes needed to visit in order to go from one of the nodes to the other. The average shortest path length 
is obtained by averaging the above quantity between all possible pairs of nodes in a given network. The maximum 
shortest path length, also known as the diameter of the network, is the length of the longest among the shortest paths 
in the network. Both of these observables scale logarithmically with the system-size N in SW-like networks [8!|. The 
system-size dependence of these path lengths for our one-dimensional SW network is logarithmic as expected, see 
Fig.0 




FIG. 7: Average and maximum shortest path (diameter) as a function of the number of nodes for the SW synchronization 
network in ID, as described in the text. The latter is also referred to as the diameter of the network. The solid and dashed 
lines both indicate the logarithmic dependence. Note the normal-log scales. 

We now describe the modified algorithmic steps for the SW connected PEs 38] . In the SW conservative 
PDES scheme, in every parallel time step, each PE, with probability p, compares its local simulated time 
with its full virtual neighborhood, and can only advance if it is the minimum in this neighborhood, i.e., if 
Ti{t)< min{Tj_i(£), Tj+i(t), T r a\ (t)}, where r(i) is the random connection of PE i. With probability (1— p) each PE 
follows the original scheme, i.e., the PE then can advance if T i (^)<min{r i _ 1 (i), r i+1 (i)}. Our network model including 
the nearest neighbors and random SW links can be seen in Fig. db) ■ Note that the occasional extra checking of the 
simulated time of the random neighbor is not needed for the faithfulness of the simulation |38| . It is merely introduced 
to control the width of the time horizon. The occasional checking of the virtual time of the random neighbor (with rate 
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p) introduces an effective strength J=J(p) for these links. Note that this is a dynamic "averaging" process controlled 
by the parameter p and can possibly be affected by nonlinearities in the dynamics through renormalization effects. 
The exact form of J(p) is not known. The only plausible assumption we make for J is that it is a monotonically 
increasing function of p and is only zero when p—0. 

In what follows, we focus on the characteristics of the synchronization dynamics on the network. As we have seen 
for the one-dimensional regular network, the communication topology between the nodes (up to linear terms) leads to 
simple relaxation, governed by the Laplacian on the regular grid. Random communication links give rise to analogous 
effective couplings between the nodes, corresponding to the Laplacian on the random part of the network. Thus, the 
large-scale properties of the virtual time horizon of our SW scheme are governed by the effective Langevin equation 

d t n = v 2 f, - J2 J a{n - + ••■ + r)i{t) , (13) 

3 

where the ... stands for infinitely many non-linear terms (involving non- linear interactions through the random links 
as well), and is proportional to the symmetric adjacency matrix of the random part of the network: Jij=J(j>) 
if sites i and j are connected by a random link and Jy=0 otherwise. For our specific SW construction each node 
has exactly one random neighbor, i.e. there are no fluctuations in the individual connectivity (degree) of the nodes. 
Our simulations (to be discussed below) indicate that when considering the large-scale properties of the systems, the 
Laplacian on the random part of the network generates an effective coupling 7 to the mean value of the fluctuations 
in the synchronization landscape r |38|. At the level of the structure factor, it corresponds to an effective mass 7 (in 
a field-theory sense) 

S(k) cx — Lj , (14) 
7 + k 

where 7=7(p) is a monotonically increasing function of p with 7(0)=0. 

We emphasize that the above is not a derivation of Eq. i|14|) , but rather a "phenomenological" description of our 
findings. It is also strongly supported by exact asymptotic results for the (linear) EW model on SW networks, 
where the effect of the Laplacian on the random part of the network is to generate an effective mass [2(3, ■ The 
averaging over th e q uenched network ensemble, however, can introduce nontrivial scaling and corrections in the 
effective coupling [2(J, l2ll l22| . In our case, this is further complicated by the nonlinear nature of the interaction. The 
results of "simulating the simulation" , however, suggest that the dynamic control of the link strength and nonlinearities 
only give rise to a renormalized coupling and a corresponding renormalized mass. Thus, the dynamics of the BCS 
scheme with random couplings is effectively governed by the EW relaxation in a small- world |20l l2ll |22| . 

From Eq. I|14|l it directly follows that the lateral correlation length, in the infinite system-size limit, scales as 

£~7~ 1/2 , (15) 

i.e., becomes finite for all p ^0. The presence of the effective mass term in the structure factor in Eq. I|14l) implies 
that limfc^o S(k)<oa, that is, there are no large amplitude long-wavelength modes in the surface. Consequently, 
the width (w 2 ) — (l/N) J^k^o ^(^) ~ £ ^ s a ^ so nn ite. Our simulated time landscapes indeed show that they become 
macroscopically smooth when SW links are employed [Figs.|3Jb) and|3{c)], compared to the the same dynamics with 
only short-range links [Fig. Efa)]. 

In the simulations, we typically performed averages over 10-100 network realizations, and compared the results 
to those of individual runs. Our results indicate that the observables we studied (the width and its distribution, 
the structure factor, and the utilization) display strong self-averaging properties, i.e., for large enough systems, they 
become independent of the particular realization of the underlying SW network. Simulation results for the structure 
factor, S(k), for the SW synchronization scheme are shown in Fig.|SJa). If an infinitesimally small p is chosen, S(k) 
approaches a finite constant in the limit of fc^O, and in turn, the virtual time horizon becomes macroscopically 
smooth with a finite width. 

A possible (phenomenological) way to obtain the correlation length is to fit our structure factor data to Eq. H14fl . 
more specifically, by plotting 1/S(k) versus k 2 , which exhibits a linear relationship. By a linear fit, 7 is then the 
ratio of the intercept and the slope (inset in Fig.lSJa)). Alternatively, one can confirm that the massive propagator in 
Eq. (|14|l indeed leads to an exponential decay in the two-point correlation function G(l) from which the correlation 
length can also be extracted [Fig. EP>)]. In our case with a system-size ^=16384 it is £«27 for p=0.1 and £«1.6 for 
p=l. Fig. |SJc) shows the correlation length extracted from the structure factor S(k) as a function of p for different 
system-sizes. 

An alternative way to determine the correlation length is to attempt a finite-size scaling analysis of the width 
(w 2 ). There are two length scales in the system: the linear system size N and the correlation £ of an infinite system. 
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FIG. 8: (a) Structure factor for the ID SW synchronization scheme with p—0.1. The inset shows 1/S(k) vs. k 2 for small 
values of k, confirming the coarse-grained prediction Eq. 11411 . (b) The spatial two-point correlation function as a function of 
(the Euclidean) distance I between the nodes for two different values of p, indicating an exponential decay with an average 
correlation length £«27 and £«1.6 for p=0.1 and p=l, respectively. The inset shows the same data on log-normal scales, (c) 
Correlation length vs. p for different system-sizes. The dashed line corresponds to the estimate of the exponent s, ~ p~ s 
with s«0.84, in the small-p regime for an asymptotically infinite system. 
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FIG. 9: (a) The average steady-state width in the ID SW synchronization landscape as a function of the system size for different 
values of p in the range of [10 -3 , 10 -1 ]. The dashed line indicates the EW/KPZ scaling, corresponding to the small system-size 
behavior, (b) The average steady-state width in the ID SW synchronization landscape as a function of p for different values 
of N in the range of [40, 10 4 ]. The dashed line indicates the best-fit power law in the asymptotic large- N small-p regime to 
extract the correlation length exponent s, according to Eqs. I16H - 118|I . 



For p=0, (w"*)~N, while for p>0 and N^oo, (w 2 }~£. For non-zero p and finite N the scaling of the width can be 
expected [2l|, [28| to follow 

(w 2 ) - Ng(£/N) , (16) 

where g{x) is a scaling function such that 

For non-zero p and for sufficiently small systems (N<^.^(p)) one can confirm that the behavior of the width follows 
that of the system without random links (w 2 ) ~ TV [Fig. Et a )]- For large-enough systems, on the other hand, we can 
extract the p-dependence of the infinite-system correlation length as (w 2 ) ~ £(p) [Fig. Efb)], yielding 



(18) 



11 




FIG. 10: The scaled versions of Figs. EH a) and (b), as proposed by Eqs. ijlSl-lfflfy by plotting (w 2 ) /N vs. l/p s N with s=0.84. 
The data points are identical in (a) and (b) , but data points connected by a line have the same value of p in (a) and have the 
same value of TV in (b), as obtained by rescaling Figs.^b) and (b), respectively. The dashed line corresponds to the asymptotic 
small- x behavior of the scaling function g(x) [Eq. 1171 1. 




FIG. 11: (a) The average steady-state width as a function of system-size for different values of p in the ID SW synchronization 
scheme. The p=0 case corresponds to the purely ID BCS scheme (exhibiting EW/KPZ scaling, indicated by a dashed line) 
and is also shown for comparison. Steady-state width distributions in the ID SW synchronization scheme for (b) p=0.1 and 
for (c) p=l. The distributions were constructed using ten different network realizations, except for iV=10 6 , where only one 
realization was obtained due to computational limitations. All width distributions, however, indicated self-averaging. 



where s f» 0.84. 

We then studied the data collapse as proposed by Eq. (fT^f) by plotting (w 2 )/N vs. l/p s N. In fact, we performed 
this rescaling originating from both raw data sets Figs.^a) and (b). The resulting scaled data points in Fig, HUf a) 
and (b), of course, are identical in the two figures, but the lines connect data points with the same value of p in 
Fig. llUf a) and with the same value of TV in Fig. Ildf b). These scaled plots in Fig. ^| indicate that there are very 
strong corrections to scaling: data for larger p or smaller TV values peel off from the proposed scaling form Eq. i|17|) 
relatively quickly. These strong corrections are possibly the result of the nonlinear nature of the interaction between 
the nodes on the quenched network. We note that the purely linear EW model on identical networks exhibits the 
scaling proposed in Eqs. and H17fl without noticeable corrections to scaling [2lll28| . 

The non-zero 7, leading to a finite correlation length, £, ensures a finite width in the infinite system-size limit. Our 
simulations show that the width saturates to a finite value for p>0 [Fig. Illf al]. The distribution of the steady-state 
width P(w 2 ) changes from that of the EW/KPZ class to a delta function for non-zero values of p as the system size 
goes to infinity. Figure ITlT b) and (c) shows the width distributions for p=0.1 and p=l, respectively. The scaled width 
distributions (to zero mean and unit variance), however, exhibit the convergence to a delta function through nontrivial 
shapes for different values of p. For p=0.1 [Fig. I12f a)] the distributions appear to slowly converge to a Gaussian as 
the system-size increases. For p=l [Fig. HW bl]. the trend is opposite, up to the system sizes we could simulate; as the 
system-size increases, the distributions exhibit progressively non-Gaussian features (closer to an exponential) around 
the center, for up to iV=10 6 . Note that not only the average width (w 2 ), but also the full distribution P(w 2 ) was 
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FIG. 12: Steady-state width distributions for the ID SW synchronization scheme scaled to zero mean and unit variance for (a) 
p=0.1 and (b) p=l. The dashed curves are similarly scaled Gaussians for comparison. 



found to be self-averaging, i.e., independent of the particular realization of the underlying SW network, so the above 
effect is not due to insufficient averaging over network realizations. Since we were intrigued by the above change in 
the trend of the convergence (i.e., away from vs. toward the Gaussian), we carried out some exploratory simulations 
for a range of p values and evaluated the kurtosis and the skewness of the width distributions. The results shows 
that this change occurs at around p w 0.9. We could not conclude, however, whether this change corresponds to an 
actual transition and the emergence of a strong disordered-coupling regime, where the distribution of the width is a 
delta function, centered about a finite value, but, the limiting shape of the delta function is non-Gaussian, or to the 
development of strong non-monotonic finite-size effects. 




40000 60000 

N 



le+05 



FIG. 13: Steady-state utilization of SW synchronization network in ID as a function of system-size for three different values of 
p=0 (BCS), p=0.1 and p=l. 



The effect of the random communication links on the utilization can be understood as follows. According to the 
algorithmic rules, the virtual times of the full network neighborhood (including the random neighbor) are checked 
with probability p, while with probability (1— p) only short-ranged synchronization is employed. Thus, the average 
progress rate of the simulated times becomes 

(«) - (i-j>)(e(-^_i)e(0 i ))+p(e(-0 i _i)e(0 i )e(r r(i) -T i )> 

= (e(-^_ 1 )6(0 l )) - P [<e(-&_i)e(&)> - (e(-&_i)e(&)e(Tr(i) - Ti ))] . (19) 

Note that performing disorder averaging (over random network realizations) makes the right hand side independent 
of i. In the presence of the SW links the regular density of local minima (9(— <fe_i )8(<fo)) remains nonzero (in fact, 
increases, compared to the short-range synchronized BCS scheme) j33, E3, H3 ■ Thus, for an infinitesimally small 
p, the utilization, at most, can be reduced by an infinitesimal amount, and the SW- synchronized simulation scheme 
maintains a nonzero average progress rate. This can be favorable in PDES where scalable global performance requires 
both finite width and nonzero utilization. With the SW synchronization scheme, both of these objectives can be 
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achieved. For example, for p=0.1 (u)oo — 0.242, while for p=l (u)oo ~ 0.141. The steady-state utilization as a 
function of system size for various values of p can be seen in Fig. 1131 

IV. BASIC CONSERVATIVE SYNCHRONIZATION SCHEME IN 2D 

A natural generalization to pursue is the same synchronization dynamics and the associated landscapes on net- 
works embedded in higher dimensions. One might ask whether PDES of two-dimensional phenomena exhibit kinetic 
roughening of the virtual time horizon. Preliminary results indicated that this is the case |53,|61|. In this section we 
give detailed results when the BCS scheme is extended onto a two-dimensional lattice (Npe=N 2 , N being the linear 
size) in which each node has four nearest neighbors. We consider a system with periodic boundary conditions in both 
axis as can be seen in Fig. I14f a). The same microscopic rules, i.e. each node increments its local simulated time by 
an exponentially distributed random amount when it is a local minima among its nearest neighbors, are applied to 
this lattice. As in the one-dimensional case, during the evolution of the local simulated times correlations between 
the nodes develop in the system. One observes rough time surfaces in the steady-state after simulating this system. 
Figure [ToT a) shows the contour plot of the simulated time surface for the BCS scheme in 2D. In 2D as well, we 



(a) (b) 




FIG. 14: Communication topologies in 2D. (a) 2D regular network, where each node is connected to its four nearest neighbors, 
(b) SW synchronization network. One random link per node is added on top of the 2D regular network. Red arrows show the 
bidirectional random links between the nodes. 



(a) (b) (c) 




FIG. 15: Synchronization landscapes as contour plots for the 2D BCS on regular and SW networks of 128x128 nodes, (a) for 
the BCS scheme on a regular lattice with only nearest-neighbor connections (equivalent to p=0); (b) for p=0.1; (c) for p—1. 

observe kinetic roughening of the BCS scheme. The simulated time surface for a finite system initially roughens with 
time in a power-law fashion. It then saturates after some system-size dependent crossover time to its system-size 
dependent steady-state value, as shown in [Fig. 116( a)]. Our estimate for the growth exponent in the early-time regime 
is /3=0.125, significantly smaller than in one dimension. 
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The roughness exponent a for KPZ-like systems have been measured and estimated in a number of experiments 
and simulations [68|. Since exact exponents for the higher-dimensional KPZ universality class are not available, for 
reference, we compare our results to a recent high-precision simulation study by Marinari et al. |9ll | on the restricted 
solid-on-solid (RSOS) model H2, a model believed to belong to the KPZ class. They found in Ref. [13 that a~0.39 
for the 2D RSOS roughness exponent. While our simulations of the virtual time horizon show kinetic roughening 
in Fier. I16f ab the scaled plot, suggested by Eq. J3J, indicates very strong corrections to scaling for the BCS in 2D 
[Fig. 116( a) inset]. Figure I16f b) and (c) also indicates that the (KPZ) scaling regime is approached very slowly, which 
is not completely unexpected: for the ID BCS scheme as well, convergence to the steady-state roughness exponent 
Fig. |3 a) and to the KPZ width distribution Fig.[5jb) only appears for linear system sizes N>O(10 3 ). Here, for the 
2D case, the asymptotic roughness scaling [Fig. lToT b)] and width distribution [Fig. lrpT c')] has not been reached for the 
system sizes we could simulate (up to linear system size 7V=2048). Nevertheless the trend in the finite-size behavior, 
and the identical microscopic rules (simply extended to 2D) suggest that 2D BCS landscape belongs to the 2D KPZ 
universality class. 
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FIG. 16: (a) Time evolution of the width in 2D BCS scheme. The dashed line indicates the power-law behavior of the width 
before saturation with a growth exponent /3~0.125. The inset shows the scaled plot {w 2 ^ /N 2a vs. t/N z . (b) Steady-state 
width of the 2D BCS scheme as a function of the linear system-size. The dashed line corresponds to the asymptotic 2D KPZ 
scaling with roughness exponent 2q=0.78 as obtained by high-precision simulations of the RSOS model |9l|l . Note the log-log 
scales, (c) The scaled width distributions for the 2D BCS scheme. The solid curve is the asymptotic 2D KPZ scaled width 
distribution, again from high-precision RSOS simulations |8al . Note the log-normal scales. 



For further evidence, we also constructed the structure factor for the 2D BCS steady-state landscape. As shown in 
Fig. I17f ab S(k x ,k y ) exhibits a strong singularity about k = 0. For further analysis, we exploited the symmetry of 

S(k x ,k y ) that it can only depend on |k| = \Jk% + ky . Hence, we averaged over all directions having the same wave 

number |k| to obtain 5(|k|). For small wave numbers we found that it diverges as 

5 (l k D - - (2°) 

with a = 0.39, as shown in Fig. I17f bh This is consistent with the small-fc behavior of the structure factor of the 
2D KPZ universality class with roughness exponent a~0.39 As noted above, the scaling of the width and its 

distribution exhibited very slow convergence to those of our "reference" -KPZ system, the RSOS model This 
is likely the effect of the non-universal and surprisingly large contributions coming from the large-A: modes, leading to 
very strong corrections to scaling for the system sizes we were able to study in 2D. Looking directly at the small-|k| 
behavior of *S'( |k| ) is, of course, undisturbed by the larger-|k| modes, hence the relatively good agreement with the 
2D KPZ scaling. 

The steady-state utilization (density of local minima) in the 2D BCS synchronization landscape approaches a 
nonzero value in the limit of an infinite number of nodes, (u(oo))~0.1201 as can be seen in Fig. 1181 This is consistent 
with the general approximate behavior (m(oo)} ~ const. /d on hypercubic lattices in d dimension |6lll87| . i.e., (u(oo)) is 
approximately inversely proportional to the coordination number. As shown in Fig. 1181 (Inset), for a two-dimensional 
BCS scheme, the steady-state utilization scales as [Eq. ©] (u(N)) ~ (u(oo)) + , where we have used the 2D 
KPZ roughness exponent a=0.39 |9l| . 
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FIG. 17: Structure factor for the BCS scheme in 2D. (a) as a function of the wave number components k x and k y . (b) as 
a function of the magnitude of the wave number for different system-sizes. The dashed line shows the asymptotic 2D KPZ 
behavior for small values of |k| [Eq. 1201 1 with a=0.39 [9l| . Note the log-log scales. 
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FIG. 18: The time evolution of the steady-state utilization in 2D BCS scheme for various system sizes. The inset shows the 
steady-state utilization in the 2D BCS scheme as a function of l/iV 2(1 - Q) with the 2D KPZ roughness exponent a=0.39. The 
dashed line is a linear fit to Eq. @ with a=0.39, (u(N)} w 0.1201 + 0.1585/jV 122 . 



SMALL- WORLD SYNCHRONIZATION SCHEME IN 2D 



The de-synchronization (roughening of the virtual time horizon) again motivates the introduction of the possibly 
long-range, quenched random communication links on top of the 2D regular network. Each node has exactly one 
(bi-directional) random link as illustrated in Fig. I14f b). The actual "microscopic" rules are analogous to the ID SW 
case: with probability p each node will check the local simulated times of all of its neighbors, including the random 
one, and can increment its local simulated time by an exponentially distributed random amount, only if it is a "local" 
minimum (among the four nearest neighbors and its random neighbor). With probability (1— p), only the four regular 
lattice neighbors are checked for the local minimum condition. 

The effect of the synchronization through the random links is, again, to stop kinetic roughening and to suppress 
fluctuations in the synchronization landscapes. Contour plots of the synchronization landscapes are shown in Fig- 
ure ll5f b) and (c) for p—0.1 and p=l, respectively. Our results indicate that for any nonzero p the width of the surface 
approaches a finite value in the limit of N^oo [Fig. H9f a)]. At the same time, the width distribution approaches a 
delta- function in the large system-size limit as shown in Figs.^Jb) and liyf cl. The scaled distributions (to zero mean 
and unit variance) again show that at least for the finite systems we observed, the shape of these distribution differs 
from a Gaussian. The deviation from the Gaussian around the center of the distribution is stronger for a larger value 
of p where the influence of the quenched random links are stronger. Note that for the ID SW landscapes as well, the 
width distribution only displayed a crossover to Gaussian behavior for smaller values of p and for very large linear 
system sizes (iV>C(10 4 ). In the 2D SW case, these linear system sizes are computationally not attainable, and the 
convergence to a Gaussian width distribution, or the existence of an inherently different strong-disorder regime (as a 
result of strong random links), remains an open question. 
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The underlying reason for the finite width is again a finite average correlation length between the nodes. The 2D 
structure factor exhibits a massive behavior, i.e., 5(|k|) approaches a finite value in the limit of k— *0 [Figs. l2~TT a') and 
l2*TT b)]. For small wave numbers, the approximate behavior of the structure factor is again 5(|k|)ocl/(|k| 2 +7) as can 
be seen in the inset of Fig. I21f bh with strong finite-size corrections to 7. The relevant feature of the synchronization 
dynamics on a SW network is the generation of the effective mass 7. Nonlinearities can give rise to a renormalized 
mass, but the relevant operator is the Laplacian on the random network. 

In the 2D SW synchronization scheme the steady-state utilization is smaller than its purely 2D counterpart (BCS 
in 2D), as a result of the possible additional checking with the random neighbors. For small values of p, however, it 
is reduced only by a small amount, and remains nonzero in the limit of an infinite number of nodes [Fig. I22j . For 
example, forp=0.1 (ujao^O.lWS, while for p—1 (u) oo ~0.084. 



(a) (b) (c) 




FIG. 19: (a) The average steady-state width as a function of linear system size for different p values in the 2D SW synchronization 
scheme. The data for p—0 corresponds to the 2D BCS scheme on a regular network with only nearest-neighbor connections. 
Steady-state width distributions in the 2D SW synchronization scheme for (b) p=0.1 and for (c) p=l for various system sizes. 




FIG. 20: Steady-state width distributions in the 2D SW synchronization scheme scaled to zero mean and unit variance for (a) 
p—0.1 and for (b) p=l. The dashed curves are similarly scaled Gaussians for comparison. 



VI. SUMMARY AND CONCLUSIONS 



We studied the large-scale properties of the synchronization landscapes of PDES, applicable to certain distributed 
and networked computer systems for a large number of scientific problems. We investigated the effects of SW-like 
additional communication links between the nodes added on the top of ID and 2D regular networks. With purely 
regular short-range connections the synchronization landscape is rough and belongs to the KPZ universality class. 
This property can hinder efficient data management. To suppress diverging fluctuations (as a function of the number 
of the nodes) we constructed a SW synchronization scheme where the nodes also synchronize with a randomly chosen 
one with a possibly infinitesimal frequency. For an infinitesimally small rate of communications over the random links, 




FIG. 21: (a) Steady-state structure factor for the 2D SW scheme as a function of the components k x and k y for p—1. (b) 
Structure factor as a function of \k\ for the 2D SW synchronization scheme for p=l. The inset shows 1/S(k) vs. k 2 for small 
values of |fc|. 




FIG. 22: Steady-state utilization of SW synchronization network in 2D as a function of system-size for three different values of 
p=0 (BCS), p=0.1 and p=l. 

this scheme results in a progress rate reduced only by an infinitesimal amount, while the width of the time horizon 
becomes finite in the limit of infinitely many PEs. Thus, the scheme is fully scalable as the PEs make nonzero and 
close-to- uniform progress without global intervention. In obtaining our results, we used coarse-grained arguments 
to identify the universality class of the PDES time horizon, and confirmed those predictions by actually simulating 
the simulations, based on the exact algorithmic rules. Our study of the width distributions of SW-synchronized 
PDES landscapes also revealed that while they converge to delta-functions, centered about a finite value, their shape 
progressively becomes non-Gaussian for larger values of p. We could not conclude if these observations indicate the 
emergence of a strong-disorder regime, or strong non-monotonic finite size effects for our attainable system sizes. 
Future work may address these questions. 

In this work we considered the simplest (and in some regards, the worst case) scenario, where each nodes carries one 
site of the underlying physical system, hence synchronization with nearest neighbor PEs is required at every step . In 
actual parallel implementations the efficiency can be greatly increased by hosting many sites by each PE [4(1 liM |44j . 
That way, communication between PEs is only required when local variables are to be updated on the boundary 
region of the sites hosted by the PEs (within the finite range of the interactions) . While the above procedure clearly 
increases the utilization and reduces the actual communication overhead, it gives rise to an even faster growing early 
time regime in the simulated time horizon |93j . Since the PEs rarely need to synchronize, up to some crossover time, 
the evolution of the time horizon is governed by random deposition 68], a faster roughening growth, before eventually 
crossing over to the KPZ growth and a subsequent saturation. 

Our findings are also closely related to critical phenomena and collective phenomena on networks yO, |9J| . In 
particular, in recent years, a number of prototypical models have been investigated on SW networks |l2i IbH . llfiL ITri 
El E El IM El El El Hljl IHHl HI HI M mm DM GM G"q1 . Of these^ the ones most closely related to 
our work are the XY modei[lJ the EW model HflUllla, diffusion m El El El IS El , and current flow 
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[28] on SW networks. The findings suggest that systems without inherent frustration exhibit (strict or anomalous) 
|2Ct I211 I22I I2H l98| mean-field-like behavior when the original short-range interaction topology is modified to a SW 
network. In essence, the SW couplings, although sparse, induce an effective relaxation to the mean of the respective 
local field variables, and in turn, the system exhibits a mean-field-like behavior l98l . This eff ect is qualitatively similar 
to those observed in models with "annealed" long-range random couplings |25l Il03l Il04l . but on (quenched) SW 
networks, the scaling properties can differ from those with annealed interactions |2ft l2lL l2*2t I27I losj . 
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